Model Selection

Ultra-low bit quantization

# Ultra-low bit quantization

Holo1-3B is a multimodal model based on the Transformer architecture, focusing on visual document retrieval tasks and performing excellently in the WebVoyager benchmark test, balancing accuracy and cost.

Transformers English

The Holo1-7B GGUF model is part of the Surfer-H system and is suitable for multimodal tasks such as visual document retrieval. It is particularly good at web page interaction and network monitoring, and can achieve high accuracy at a low cost.

Transformers English

Qwq 32B ArliAI RpR V4 GGUF

A text generation model based on Qwen/QwQ-32B, specializing in role-playing and creative writing tasks, supporting ultra-low bit quantization and long dialogue processing.

Large Language Model

Transformers English

Kanana 1.5 8b Instruct 2505 GGUF

Kanana 1.5 is the new version of the Kanana model series, with significant improvements in coding, mathematics, and function calling capabilities, capable of processing inputs up to 32K tokens, and up to 128K tokens when using YaRN.

Large Language Model

Transformers Supports Multiple Languages

Medgemma 4b It GGUF

MedGemma-4B-IT is a medical multimodal model based on Gemma 3, supporting the understanding of medical text and images, and suitable for the development of medical AI applications.

Medgemma 27b Text It GGUF

MedGemma-27B-Text-IT is a medical-specific large language model based on the Gemma 3 architecture, optimized for medical text processing and offering multiple quantization versions to adapt to different hardware environments.

Large Language Model

Qwenlong L1 32B GGUF

QwenLong-L1-32B is a large language model designed for long context reasoning. It is trained through reinforcement learning and performs excellently in multiple long context question answering benchmark tests, capable of effectively handling complex reasoning tasks.

Large Language Model

Dans PersonalityEngine V1.3.0 24b GGUF

Dans-PersonalityEngine-V1.3.0-24b is a multi-functional model series that has been fine-tuned on more than 50 professional datasets and supports multilingual and professional domain tasks.

Large Language Model

Qwen3 30B A6B 16 Extreme GGUF

An ultra-low bit quantization model generated based on Qwen/Qwen3-30B-A3B-Base, supporting a 32k context length and suitable for various hardware environments

Large Language Model

Llama 3.1 Nemotron Nano 4B V1.1 GGUF

Llama-3.1-Nemotron-Nano-4B-v1.1 is a large language model optimized based on Llama 3.1, achieving a good balance between accuracy and efficiency. It is suitable for various scenarios such as AI agents and chatbots.

Large Language Model

Transformers English

Opencodereasoning Nemotron 32B IOI GGUF

A large language model based on Qwen2.5-32B-Instruct, post-trained specifically for code generation and reasoning, supporting a 32K context length, suitable for both commercial and non-commercial use.

Large Language Model

UI TARS 1.5 7B GGUF

UI-TARS-1.5-7B is a multimodal model based on advanced technology, which performs excellently in tasks such as image-text conversion. It adopts an innovative quantization method and can maintain high accuracy at extremely low bit rates.

Josiefied Qwen3 8B Abliterated V1 GGUF

Quantized version of Qwen3-8B, utilizing IQ-DynamicGate ultra-low bit quantization technology to optimize memory efficiency and inference speed

Large Language Model

Phi 4 Mini Reasoning GGUF

Phi-4-mini-reasoning is a lightweight open model built on synthetic data, focusing on high-quality, reasoning-rich data, and further fine-tuned for more advanced mathematical reasoning capabilities.

Large Language Model

Foundation Sec 8B GGUF

Foundation-Sec-8B is a language model specifically designed for network security applications. Based on the Llama-3.1 architecture, it has been pre-trained on a large amount of network security-related text data and can understand and process various concepts, terms, and practices in the field of network security.

Large Language Model

Transformers English

Qwen2.5 7B Instruct GGUF

Qwen2.5-7B-Instruct is an instruction-tuned model based on Qwen2.5-7B, optimized for text generation tasks, especially in chat scenarios.

Large Language Model English

Olympiccoder 7B GGUF

OlympicCoder-7B is a code generation model optimized based on Qwen2.5-Coder-7B-Instruct. It uses the IQ-DynamicGate ultra-low bit quantization technology and is designed for memory-constrained environments.

Large Language Model English

phi-2 is a text generation model employing IQ-DynamicGate ultra-low bit quantization (1-2 bits), suitable for natural language processing and code generation tasks.

Large Language Model Supports Multiple Languages

GLM Z1 32B 0414 GGUF

GLM-Z1-32B-0414 is a 32B-parameter multilingual text generation model supporting Chinese and English, released under the MIT license.

Large Language Model Supports Multiple Languages

GLM 4 32B 0414 GGUF

The GLM-4-32B-0414 GGUF model is a series of powerful text generation models with various quantization formats, suitable for different hardware and memory conditions.

Large Language Model

Transformers Supports Multiple Languages

Llama 3.1 Nemotron 70B Instruct HF GGUF

A model fine-tuned based on Meta Llama-3.1-70B-Instruct, optimized with NVIDIA HelpSteer2 dataset, supporting text generation tasks.

Large Language Model English

Orpheus 3b 0.1 Ft GGUF

An ultra-low bit quantized model optimized based on the Llama-3-8B architecture, utilizing IQ-DynamicGate technology for adaptive 1-2 bit precision quantization, suitable for memory-constrained environments.

Large Language Model English

Olmo 2 0325 32B Instruct GGUF

An instruction-tuned model based on OLMo-2-0325-32B-DPO, utilizing IQ-DynamicGate ultra-low bit quantization technology, optimized for memory-constrained environments.

Large Language Model English

Qwen2.5 VL 3B Instruct GGUF

Qwen2.5-VL-3B-Instruct is a 3B-parameter multimodal model supporting image-text generation tasks, specifically optimized for vision capabilities in llama.cpp.

Text-to-Image English

Llama 3.1 Nemotron Nano 8B V1 GGUF

An 8B parameter model based on the Llama-3 architecture, optimized for memory usage with IQ-DynamicGate ultra-low bit quantization technology

Large Language Model English

Mistral Small 3.1 24B Instruct 2503 GGUF

This is an instruction-tuned model based on Mistral-Small-3.1-24B-Base-2503, utilizing GGUF format and IQ-DynamicGate ultra-low bit quantization technology.

Large Language Model Supports Multiple Languages

Mistral 7B Instruct V0.2 GGUF

Mistral-7B-Instruct-v0.2 is an instruction-tuned model based on the Mistral-7B architecture, supporting text generation tasks, optimized for memory efficiency using IQ-DynamicGate ultra-low bit quantization technology.

Large Language Model

Mistral 7B Instruct V0.1 GGUF

Mistral-7B-Instruct-v0.1 is a fine-tuned model based on Mistral-7B-v0.1, supporting text generation tasks. It employs IQ-DynamicGate ultra-low bit quantization technology, making it suitable for memory-constrained deployment environments.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase